Combining Top-down and Bottom-up Search for Unsupervised Induction of Transduction Grammars
نویسندگان
چکیده
We show that combining both bottom-up rule chunking and top-down rule segmentation search strategies in purely unsupervised learning of phrasal inversion transduction grammars yields significantly better translation accuracy than either strategy alone. Previous approaches have relied on incrementally building larger rules by chunking smaller rules bottomup; we introduce a complementary top-down model that incrementally builds shorter rules by segmenting larger rules. Specifically, we combine iteratively chunked rules from Saers et al. (2012) with our new iteratively segmented rules. These integrate seamlessly because both stay strictly within a pure transduction grammar framework inducing under matching models during both training and testing—instead of decoding under a completely different model architecture than what is assumed during the training phases, which violates an elementary principle of machine learning and statistics. To be able to drive induction top-down, we introduce a minimum description length objective that trades off maximum likelihood against model size. We show empirically that combining the more liberal rule chunking model with a more conservative rule segmentation model results in significantly better translations than either strategy in isolation.
منابع مشابه
Segmenting vs. Chunking Rules: Unsupervised ITG Induction via Minimum Conditional Description Length
We present an unsupervised learning model that induces phrasal inversion transduction grammars by introducing a minimum conditional description length (CDL) principle to drive search over a space defined by two opposing extreme types of ITGs. Our approach attacks the difficulty of acquiring more complex longer rules when inducing inversion transduction grammars via unsupervised bottom-up chunki...
متن کاملIterative Rule Segmentation under Minimum Description Length for Unsupervised Transduction Grammar Induction
We argue that for purely incremental unsupervised learning of phrasal inversion transduction grammars, a minimum description length driven, iterative top-down rule segmentation approach that is the polar opposite of Saers, Addanki, and Wu’s previous 2012 bottom-up iterative rule chunking model yields significantly better translation accuracy and grammar parsimony. We still aim for unsupervised ...
متن کاملLearning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
We present a novel model, Freestyle, that learns to improvise rhyming and fluent responses upon being challenged with a line of hip hop lyrics, by combining both bottomup token based rule induction and top-down rule segmentation strategies to learn a stochastic transduction grammar that simultaneously learns both phrasing and rhyming associations. In this attack on the woefully under-explored n...
متن کاملAlternating Regular Tree Grammars in the Framework of Lattice-Valued Logic
In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued) regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...
متن کاملBayesian Induction of Bracketing Inversion Transduction Grammars
We present a novel approach to learning phrasal inversion transduction grammars via Bayesian MAP (maximum a posteriori) or information-theoretic MDL (minimum description length) model optimization so as to incorporate simultaneously the choices of model structure as well as parameters. In comparison to most current SMT approaches, the model learns phrase translation lexicons that (a) do not req...
متن کامل